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PREFACE 


Part  of  the  Project  RAND  research  program  consists 
of  basic  supporting  studies  In  mathematics.  The 
matheomtlcal  research  presented  here  concerns  control 
theory  and  In  partlciilar  the  relation  between  optimal 
policies  for  control  over  a  finite  and  Infinite  time 
Interval . 
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SUMMARY 

The  purpose  of  this  paper  Is  to  describe  some 
general  problems  concerning  the  asymptotic  behavior  of 
control  processes  as  the  time— interval  becomes  Infinite, 
to  present  some  partial  results  In  the  general  case,  and 
to  provide  a  detailed  analysis  of  a  one— dimensional 
control  process. 
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ASYMPTOTIC  CONTROL  THEORY 
by 

1  2 
Richard  Bellman  and  Richard  Bucy 

1.  INTRODUCTION. 

In  recent  years  the  mathematical  theory  of  control 
has  received  an  Increasing  amount  of  attention.  New 
theories  have  been  developed  and  older  theories  have  been 
refined  and  extended  [l,2,3>^>5i6l. 

In  this  paper,  we  wish  to  initiate  discussion  of  a 
problem  In  the  calculus  of  variations  which  has  not  had 
the  attention  due  It  In  the  classical  literature.  The 
problem  is  concerned  with  the  asymptotic  behavior  of  the 
solution  of  a  variational  problem  as  the  time  interval 
becomes  infinite.  From  the  standpoint  of  control  theory, 
and  more  generally  from  the  standpoint  of  dynamic  program¬ 
ming,  this  Is  a  very  natural  type  of  behavior  to  study. 

In  many  significant  cases,  the  "steady— state"  policy  is 
simpler  conceptually,  analytically  and  computationally. 

We  shall  consider  the  minimization  of  the  fvinctlonal 

(1.1)  J(u)  =  ^  ^  (u^  +  L(x))dt 

0 
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over  all  functions  u  vrtiere 

(1.2)  X  «  f(x)  +  u,  x(0)  =  c. 

Let  V(c,T)  ■»  min  J(u).  For  finite  and  sufficiently 
u 

small  T  the  classical  calculus  of  variations,  or 
dynamic  programming  applies,  under  certain  reasonable 
assiunptions  on  L  and  f.  We  shall  be  interested, 
however,  in  the  following  questions: 

(1)  When  does  the  problem  for  infinite  T  make 
sense? 

(2)  When  it  does,  are  the  optimal  motions  and 
policies  for  finite  T  the  limits  of  the 
corresponding  optimal  motions  and  policies 
for  finite  T? 

(3)  What  is  the  effect  of  using  steady-state 
optimal  policy  for  the  finite  problem? 

nils  is  an  example  of  what  we  mean  by  asymptotic  control 
theory. 

2  1  4 

For  example,  if  f  =  0  and  L  =  x  +  ^  x  the 
problem  is  that  of  minimizing  the  functioned 

(1.3)  J(u)  »  Ik^  +  ^  x^]dt 

0 

over  all  curves  for  which  x(0)  =  c.  The  Euler 

equation  is 

X  —  X  —  x'^  =  0, 


(1.4) 
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subject  to  the  two-point  bo\indary  conditions 

(1.5)  x(0)  «»  c,  x(T)  «  0. 

Establishing  the  existence  Euid  uniqueness  of 
solutions  of  (l.H)  and  determining  the  asymptotic 
behavior  as  T  -»  00  Is  analogous  to  the  classical 
problem  of  Poincare— Iyap\inov  ,  but  materially  more 

difficult  because  of  the  two-point  boundary— value 
condition. 

We  shall  first,  using  quite  generail  argxjments,  show 
that  V(T,c)  Is  monotone  increasing  as  a  fxmction  of  T, 
and  viniformly  bounded  under  mild  restrictions  concerning 
L(x) .  Taking  advantage  of  the  fact  that  the  Euler 
equation  possesses  a  first  integral,  we  can  analyze  the 
behavior  of  the  solution  In  detail  as  T  00 . 

This  analysis  shows  that  the  formal  asymptotic 
series  obtained  from  the  partial  differential  equation 

(1.6)  =  min  (u^  +  L(x))  +  V^(ax  +  u)] , 

an  equation  derived  from  dynamic  programming  considerations 
which  yields  the  Hamilton— Jacobi  equation  relevant  to  the 
variational  problem  vdien  f(x)  =  ax([l]  and  [12I),  is  an 
actual  asymptotic  series  for  V(c,T).  This  corresponds 
to  the  result  easily  derived  In  the  case  vrtiere  the 
Integrand  in  (1.1)  Is  merely  quadratic  In  x  and  u. 


In  the  concluding  section,  we  shall  mention  some 
open  and  apparently  quite  difficult  questions  In  connection 


with  asymptotic  behavior  and  give  some  references  to 
analogous  resiilts  obtained  for  dynamic  programming 
processes  by  Kalman  and  Bucy  [ 6] ,  Beckwith  [  7^  j  Iglehart 
[  8]  ,  Frelmer  [  9^ «  and  Bellman  [  lo] . 

2.  MQNOTONICITY  AND  BOUNDEDNESS. 

Let  us  Introduce  the  function 

(2.1)  V(c,T)  «  min  J(u), 

u 

(with  the  assumption  that  f(x)  »  ax).  Let  x(t,T), 
u(t,T)  represent  the  functions  that  furnish  the  mlnlnoun 
of  J(u)  under  the  assun^>tlon  that  L(x)  Is  a  nonnegative 
entire  function  of  x.  In  most  processes  of  Interest 
L(x)  Is  a  polynomial  In  x. 

Since 


(2.2)  V(c,T  +  ^)  “/^  +/ 


T  r  T+A 

0 

^  r\  m 


T  r  T+A 


V(c,T)  +  f 


T+A 


we  see  that  V(c,T)  Is  monotone  Increasing  In  T, 

To  show  uniform  botindedness  In  T,  for  fixed  c, 
let  us  choose  an  appropriate  control  policy,  say 


(2.3) 


u  =  0 


vrtien  a  <  0, 


u  «  —  2ax  when  a  >  0, 

u  *=  —  X  v*ien  a  =s  0. 

In  each  case,  we  see  that  u  =  ce~°  with  b  positive. 

Hence, 

r  T 

(2.4)  J(u)  =  J  [0(e'^'^^)  +  L(ce'^^)]dt. 

0 

Under  the  assus^tion  that  L(x)  »  0(x)  as  x  0,  the 

integral  is  \inifonnly  boimded  as  T  oo . 

Having  established  boundedness  and  monotonicity  as 
T  -*  00 ,  we  can  assert  convergence, 

(2.5)  V(c,T)-*V(c) 
as  T  -►  00 . 

It  is  not  settled,  however,  whether  or  not  the 
states  X(t,T)  and  the  policies  u(t,T)  converge  as 
T  -*  00  .  The  foregoing  argument  extends  to  quite  general 
situations,  but  leaves  unanswered  the  interesting  and 
important  questions  concerning  the  convergence  of 
policies. 

3.  DETAILED  ANALYSIS. 

We  will  be  interested  in  an  explicit  solution  to 
the  partial  differential  equation 


(3.1) 


I 
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=  I  L(c)  +  acV^  -  I 
subject  to  the  bo\indary  conditions 
(3.2)  V(T,c)|^q=0,  a  <  0, 

V(Tjc)  1  ~  ac  f  a  ^  0. 

As  Is  well  known  ([l2l)  existence  of  a  sufficiently 
smooth  solution  to  (3.1)  Is  a  sufficient  condition  for 
the  variational  problem  (1.1)  to  have  a  solution.  The 
eqxiatlon  of  (3.1)  Is  (1.6)  with  the  minimization  carried 
out. 

It  will  be  assmed  that  L  satisfies  the  following 


conditions: 

(3.3)  (1) 

L  Is  even,  and  positive. 

(2) 

L  and  are  continuous  and  Increasing 

for  positive  x. 

(3) 

L(x)  ■  0(1x1)  as  1x1  -»  0, 

W 

L  Is  analytic. 

Now  the  Cauchy—Kowalewski  theorem  Implies  (3.1)  has  a 
\mlque  local  analytic  solution  (llll). 

With  the  aim  of  solving  (3.1)  we  Introduce  the 
function  y(c,T)  which  corresponds  physically  to  the 
final  state  of  the  controlled  system  along  an  optimal 
trajectory  Initiating  at  (c,0)  and  ending  at  (y,T). 
The  following  lemma  shows  that  y  Is  well  defined  for 


2  4 

c  >  0.  The  case  c  <  0  is  similar.  When  L(c)  =  c  +  c  , 
y  will  be  defined  by  an  elliptical  Integral  of  the  first 
kind. 


Lemma  1.  Suppose  c  >  0,  and  assume  conditions  (3.3) 
are  fulfilled,  then  for  every  K,  oo  >  K  >  0,  there 
exists  a  unique  0  <  y  ^  c  such  that 

(3.4)  I(y)-/°  ■■  ■■-  -— —  »  K. 

y  +  L(x)  -  L(y) 


Proof.  Since  It  is  clearly  continuous,  elementary 
bounding  of  I(y)  shows  It  takes  on  all  finite  positive 
values  as  y  ranges  over  (0,c] .  To  show  uniqueness 
assume  for  some  finite  positive  K  that  there  exist 
y^  and  ygj  >  0  >  yg  >  0  vAiere  both  satisfy  (3.4), 
then 


(3.5)  /°  - 

y^  +  I-(x)  —  L(yj) 

_  y'  ®  _ dx _ 

V2  +  L(x)  -  L(y2) 


But  (3-5)  in?)lles 
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(3.6)  =-^== 

yg  'ia^(x  +  A)^  +  L(x  +  A)  —  L(y^) 

. . 

72  +  L(x)  -  LCyg) 

vihere  A  =*  y^  —  yg  >  0.  It  suffices  to  show  (3.7)  to 
contradict  (3.6): 

(3.7)  2a^xA  +  a^A^  >  L(x)  -  L(x  +  A)  +  L(y^)  -  1(72). 

Consider  the  right-hand  side  of  (3.7)  for  x  ^  y^.  The 
mean  value  theorem  gives 

L(x  +  A)  -  L(x)  =  L^(<p)A  ^  Lj^(y],  )A, 

(p  €  (x,x  +  A), 

L(yjL)  -  L(y2)  =  L^(e)A  ^ 

®  e  (yg^y^), 

and  for  these  x  (3.7)  is  satisfied.  Then  for 

yg  <  X  <  y^ 

L(x  +  A)  -  L(y3^)  =  L^(6)(x  -  yg)  ^  L^(yi)(x  -  yg)* 
L(x)  -  L(yg)  =  L^(7)(x  -  yg)  ^  L^(x)(x  -  yg) 

g  -  yg), 

and  (3.7)  is  satisfied  for  all  x  e  (yg,c  —  A). 


A  closer  examination  of  the  previous  lemma  shows 
I(y)  decreases  strictly  as  y  Increases  through  (0,c] 
and  hence  S/^y  I(y)  <  0.  Defining  y(c,T)  as  that 
value  of  y  which  satisfies 

(3.8)  ■■  —  T, 

y  ^a^x^  +  L(x)  —  L(y) 

it  is  easy  to  see  by  the  implicit  fxinction  theorem  that 
y  is  in  c  and  T  on  (0,c]  x  [o«oo).  Further, 

y  is  analytic !  The  following  lemma  characterizes  the 
behavior  of  y  as  T  tends  to  infinity. 

Lemma  2.  Under  the  assumptions  (3.3)  and  a  0, 
y  defined  by  (3*8)  tends  exponentially  to  zero  as  T  -► 


Proof. 


0  <  T 


^  y'  ® _ dx _ 

y  N/a^x^  +  L(x)  —  L(y) 

i  r  ^  = 

J  I aix  ' °  ^ 


or 


-la|T 


0  ^  y  ^  ce 

Now  we  will  characterize  the  solution  of  (3.1) 
subject  to  (3.2). 
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*  TOieorem  1.  Equation  (3»1)  has  the  following  analytic 


solutions  In  the  regions  c  >  0  and  c  <  0  xmder 
assumptions  (3.3).  For  a  <  0 


/  ^  a4  +  N/a^C^+L(€)-L(y)dC  +  T 


c  >  0, 


(3.9)  V(T,c)  -  0,  c  =  0 


)-L(y)de  +  T  c  <  0, 


vdille  for  a  >  0 


&i  +  ^ia^4^+L(e)-L(y)d4  +  T  +  ay^,  c  >  0, 


(3.10)  V(T,c)  -  <  0,  c  =  0, 


\/a^e^+L(^)-L(y)de  +  T  +  ay^,  c  <  0, 


where  y  satisfies 

f  °  ^ -  T  for  c  >  0, 

y  +  L(i)  -  L(y) 

_  r°  »  T,  c  <  0, 

y  +  L(0  -  L(y) 

^Replacing  aC  and  (aO^  by  f(0  and  f(l)^ 
with  f  continuous  and  ay^  by  2  f(i)di,  provides 

0  12 

a  solution  to  the  equation  =  L  +  V^f  (x)  —  5- 
which  Is  local  lanless  y  Is  defined  for  all  T.  Further 
Haar' s  uniqueness  theoraa  (I13))  Is  applicable  and  Implies 
(3.9)  for  c  >  0  Is  the  unique  solution  of  (3.1). 
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and  y  Is  the  same  sign  as  c .  Further, 

(3.11)  V(oo,c)  =  lim  V(T,c)  »  r  °  a?  + 

Thoo  ^ 

V(oo,c)  =  lim  V^(T,c)  =  ac  +  +  L(c), 

Thoo  ® 

and  the  corresponding  fozroulae  for  c  <  0  and  c  =  0. 
Finally,  the  optlmtmi  control  law  Is 

(3.12)  u®(t)  =  —  axCt)  —  sgn  x(t)  \la^x(t)^  +  L(x(t))  — 

Proof .  (3.9)  and  (3.10)  follow  from  Lemma  1  and 

direct  substitution.  (3.11)  follows  since  for  fixed  c, 

V  (c,T)  are  monotone  In  T  and  tinlfonnly  bounded  while 
c 

the  explicit  form  follows  from  Lemma  2,  (3.3)  (3)  and  the 
dominated  convergence  theorem.  (3.12)  Is  Just  the 
principle  of  optimality. 

Corollary  1.  Under  the  previous  assvggptlons,  the 
value  of  the  T— Infinite  case  V(oo,c)  satisfies 

^  L  +  V^ac  -  1  ^  Q 

Just  (3.1)  with  =  O. 

.  ASYMPTOTIC  BEHAVIOR. 

As  mentioned  above,  the  principle  of  optimality 
yields  the  partial  differential  equation 

Vj  -  I  L(o)  +  aoV^  -  I  v/. 


c  >  0, 

c  >  0, 


L(y). 


(4.1) 


It  does  not  seem  possible  to  obtain  the  asymptotic 
behavior  of  V,  even  fomally,  by  means  of  a  series 
of  the  form 


(4.2)  V  =  Vq(c)  +  V^(c,T)  +  •••, 

without  some  additional  Information  concerning  the 
analytic  structure  of  e.g., 

(4.3)  V3^(c,T)  =  V^(c)u^(T). 

Here  V^(c)  *»  11m  V(c,T). 

^  T-k» 

We  can,  however,  obtain  an  Interesting  bound  for 
V(c,oo)  —  V(c,T)  In  the  following  fashion.  Consider 
the  expression 

(4.4)  V(c,oo)  =  min  f  Ix^  +  u^  +  L(x)]dt. 

u  'L 
0 

Let  u(t,T),  x(t,T)  denote  the  minimizing  set  of 
f\mctlons  for  the  Interval  [o,t].  Then,  It  Is  clear 
that 

(4.5)  V(c,oo)  [u(t,T)^  +  x(t,T)^  +  L(x(t,T))]dt 

0 


T 


where  In  the  second  Integral  our  choice  of  u  and  x 
are  constrained  only  by  the  condition  x(T)  =  x(T,T) . 
Write  x(T,T)  »  x(c,T),  the  state  of  the  system  at 
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r 


time  T  starting  In  state  c  at  time  0  associated 
with  the  finite  variational  process  over  [o,t].  Then 

(4.5)  yields  the  Inequality 

(4.6)  V(c,oo)  ^  V(c,T)  +  V(x(c,T),oo  ). 

Hence,  we  can  obtain  an  estimate  of  the  difference 
between  V(c,oo  )  and  V(c,T)  If  we  obtain  an  estimate 
for  x(c,T)  as  T  -»  oo . 

ObseiTve  that  the  estimate  for  V(c,oo)  Is  readily 
obtained  by  using  a  convenient  approximate  policy  of  the 
type  described  In  Sec.  2. 

The  estimate  for  x(c,T)  Is  not  readily  obtained 
In  general.  Let  us  Indicate  how  elementary  arguments 
yield  the  resvilt  for  the  problem  of  minimizing 

p  T 

(4.7)  J(x)  »  J  [x^  +  x^  +  x^]  dt 

0 

where  x(0)  =  c. 

It  Is  clear  from  the  form  of  the  Integrand  that  If 
c  >  0,  then  x  Is  monotone  decreasing.  For,  If  as 
Indicated  below,  x  reached  a  turning  point  and  started 
to  Increase,  we  covild  replace  It  by  the  dotted  curve, 
obtaining  obviously  a  smaller  value  of  the  Integral: 
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u 


The  Euler  equation  Is 

(4.8)  X  —  X  —  2x^  =  0,  x(0)  *=  c,  x(T)  =  0. 

If  X  decreases  monotonlcally,  the  limit  must  be  zero 

as  T  -»  00.  Prom  the  Polncar^Iyapunov  theorem,  we  know 

that  all  solutions  of  (4.8)  vftilch  approach  zero  as 

t  -*  00  have  an  asymptotic  expansion  of  the  form 
— t  2t 

c^e  +  Cge  +  •  •  • .  Using  this  Infomatlon  In 
conjunction  with  the  preceding  results,  we  readily 
obtain  an  asymptotic  series  for  V(c,T)  as  T  -►  oo. 

5.  FURTHER  PROBLEMS. 

The  technique  we  have  used  here  to  obtain  the 
asymptotic  behavior  of  the  state  variables  and  the  control 
variable  Is  quite  special  and  does  not  extend  to  the 
multidimensional  case,  to  control  processes  with 
constraints,  to  more  general  control  processes  Involving 
distributed  parameters,  to  general  stochastic  control 
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processes  j  or  to  adaptive  control  processes.  Some 
partial  results  can  be  derived,  but  on  the  whole  there 
appears  to  be  a  need  for  a  development  of  some  new 
techniques. 

We  feel  that  It  Is  worthvdille  in  one  case  at  least 
to  show  that  the  expected  resvilts  actually  hold. 

For  asymptotic  results  In  dynamic  programming  for 
processes  of  quite  different  nature,  see  [7>8,9,lo]. 


-16- 


reperences 


1. 


2. 

3. 


4. 


5. 


6. 

7. 


8. 


9. 


10. 


11. 


12. 

13. 


Bellman^  R. ,  Dynamic  Programming;.  Princeton  University- 
Press,  Princeton,  New  Jersey,  1957. 

- ,  Adaptive  Control  Processes;  A  Guided  Tour, 

Princeton  University  tress,  Princeton,  l^ew  Jersey, 

1961. 

Pontryagln,  L.  S.,  V.  G.  Boltyanskll,  R.  V.  Gamkrelldze, 
and  E.  P.  Mishchenko,  The  ^thematic ^  Theory  of 
Optimal  PTOcesseSj  Interscience  Publishers, 

New  Voi^c,  1962. 

Bellman,  R. ,  I.  Gllcksberg,  and  0.  Gross,  Some  Aspects 
of  the  Math^iatlcal  ^eory  of  Control  Processes,  The 
RAWD  Corporation,  !R-313,  January  195^;  Russian 
translation  In  Moscow,  I962. 

Bellman,  R. ,  Stability  Theory  of  Differential 
^^tlon^  McGraw-Hill  Book  Company,  Inc . ,  Mew 

Kalman,  R.  E.,  and  R.  S.  Bucy,  "New  Resiilts  In  Linear 
Filtering  and  Prediction  Theory,"  ASME  Journal  of 
Basic  Engineering,  March  I96I. 

Beckwith,  R.  E.,  Analytic  and  Computational  Aspects 
of  Dynamic  Frograjimlng  Processes  of  High  dimension, 
Ph.D.  Thesis,  Purdue  University,  lime  1959. 

Iglehart,  D. ,  Ph.D.  Thesis,  Stanford  University,  i960. 

Frelmer,  M. ,  A  Dynat^c  Programming  Appreach  to 
Adaptive  Control  Processes,  Lincoln  Lab.  Report 

54-5,  1959. 

Bellman,  R. ,  "A  Maricovlan  Decision  Process,"  J.  Math. 
and  Mech. ,  Vol.  6,  1957,  pp.  679-684. 

Petrovsky,  I.  G.,  Partial  lafferentlal  Equations, 
Interscience  Publishers,  New  York,  1954. 

Kalman,  R.  E.,  "The  Theory  of  Optimal  Control  and  the 
Calciilus  of  Variations,"  Mathematic al  Optimization 
Techniques,  University  of  California  Press, 

Berkeley,  California,  1963,  pp.  309—331. 

Courant,  R. ,  and  D.  Hilbert,  Methods  of  ^thematic al 
Physics,  Interscience  Publishers,  New  York,  l9b2. 


